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ABSTRACT 

In fiber-fed galaxy redshift surveys, the finite size of the fiber plugs prevents two fibers from being 
placed too close to one another, limiting the ability of studying galaxy clustering on all scales. We 
present a new method for correcting such fiber collision effects in galaxy clustering statistics based on 
spectroscopic observations. The target galaxy sample is divided into two distinct populations accord- 
ing to the targeting algorithm of fiber placement, one free of fiber collisions and the other consisting 
of collided galaxies. The clustering statistics are a combination of the contributions from these two 
populations. Our method makes use of observations in tile overlap regions to measure the contribu- 
tions from the collided population, and to therefore recover the full clustering statistics. The method 
is rooted in solid theoretical ground and is tested extensively on mock galaxy catalogs. We demon- 
strate that our method can well recover the projected and the full three-dimensional redshift-space 
two-point correlation functions on scales both below and above the fiber collision scale, superior to the 
commonly used nearest neighbor and angular correction methods. We discuss potential systematic 
effects in our method. The statistical correction accuracy of our method is only limited by sample 
variance, which scales down with (the square root of) the volume probed. For a sample similar to 
the final SDSS-III BOSS galaxy sample, the statistical correction error is expected to be at the level 
of 1% on scales ~ 0.1-30 /i _1 Mpc for the two-point correlation functions. The systematic error only 
occurs on small scales, caused by non-perfect correction of collision multiplets, and its magnitude is 
expected to be smaller than 5%. Our correction method, which can be generalized to other clustering 
statistics as well, enables more accurate measurements of full three-dimensional galaxy clustering on 
all scales with galaxy redshift surveys. 

Subject headings: cosmology: observations — cosmology: theory — galaxies: distances and redshifts 
- galaxies: halos — galaxies: statistics — large-scale structure of Universe 



1. INTRODUCTION 



With fiber-fed spectrographs, galax y surveys, such 
as th e Las Campanas Redshift Survey (iShectman et al.l 



199^1. the 2dF Galaxy Redshift Survey (2dFGRS; 



Collc ssl [1991. the Sloan Digital Sky Survey (SPSS) 
(York et all 120001: iStoughton et al.l l2002t iStrauss et atl 
2002), and the Galaxy and Mass Assembly Sur- 
vey(GAMA) (jDriver et al.|[20TTI: Hobotham et alJlMol) 
can efficiently cover a large sky area and obtain redshifts 
for a large set of targeted galaxies simultaneously. A well- 
known problem of using fibers is that the finite size of 
the fiber plugs prevents two fibers from being placed too 
close to one another on the same plate. Consequently a 
significant fraction of targeted galaxies from a photomet- 
ric catalog cannot be assigned fibers and obtain measured 
spectroscopic redshifts. This problem is partly alleviated 
by having some regions on the sky covered by overlapping 
plates, but it still results in a fraction of targeted galax- 
ies left with no measured spectroscopic redshifts (e.g., 
~ 7% in the SDSS). These fiber-collided galaxies are a 
hindrance to any galaxy clustering study. In this paper, 
we propose and test a new method to account for fiber 
collision effects and to accurately measure galaxy clus- 
tering statistics on small and intermediate scales. 

The angular fiber-collision scale, under which two 
fibers on the same plate collide with each other, is de- 
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termined by the fiber placement hardware and differs 
from survey to survey. For SDSS-I and II, this scale 
is 55", corresponding to about 0.1/i _1 Mpc at the me- 
dian redshift z ~ 0.1. For SDSS-III, the angular fiber- 
collision scale is slightly larger, 62", corresponding to 
about 0.4/i -1 Mpc (comoving) at the larger median red- 
shift z ~ 0.55 of the Baryqn Oscillation Spectroscopic 
Survey (BOSS:IAihara et aJMTllEisenstein et al] l2QTlt 
IWhite et al.l 120111: lAnderson et alJ l201 If fiber colli- 
sions are not corrected for, galaxy clustering on scales 
below the collision scale would not be accurately mea- 
sured. For example, in the case of galaxy two-point corre- 
lation function (2PCF), the effect is seen as a significant 
decline in the clustering signal belo w the collision scale 
(jJing et al.l [l998t lYoon et all [2008). At a fixed angu- 
lar collision scale, the comoving scale increases with red- 
shift, making the fiber collision a more severe problem in 
studying small-scale clustering for surveys at higher red- 
shifts. Furthermore, fiber collisions have a non-negligible 
effect on galaxy clustering me asurements even on scale s 
larger than the collision scale ([Zehavi et al.ll200"2l 120051) . 
Therefore, to study galaxy clustering on small scales and 
to have precise measurements on larger scales, the effect 
induced by fiber collision has to be corrected for. 

In general, there are two approaches to correct the fiber 
collision effect in measuring galaxy 2PCFs. One is to re- 
cover the redshifts of the fiber-collided galaxies and the 
other is to reconstruct the correct galaxy pair counts. For 
the former one, a commonly adopted method is to assign 
each collided galaxy the redshift of its nearest angular 
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neighbor (nearest-neighbor corr ection; e.g.. iZehavi et all 
[20021 [2005 iBerlind et aL|[2006l) . The method has been 
applied to measure the projected 2PCF w p (r p ) at a trans- 
verse se paration r„ fo r the SDSS-I and II Main galaxy 
sample (jZehavi et al1l200l 120051 [20T1 . which proves to 
work well down to the fiber collision scale ~ 0.1 h~ 1 Mpc. 
However, the method fails below the collision scale and 
does not give a satisfactory correction for the redshift- 
space correlation function. For surveys at higher red- 
shifts, the increase in the (comoving) fiber collision scale 
limits the application of the nearest neighbor correction 
to clustering at larger scales. 

For the approach of reconstructing the pair counts, a 
common method is to make use of the angular 2PCF 
of the spectroscopic sample, w sz (9), and that of the 
parent photometric sample, w pz (9), at angular separa- 
tion 9, with the same a ngular and redshift selection 
applie d to both samples 
I2006allbl : iRoss et all I20M 



Hawkins et all 120031; iLi et al.l 



White et all 120111 ) . The pro- 



jected galaxy pair count in computing w p (r p ) is weighted 
by the ratio F(9) = [1 + w pz (9)]/[l + w sz (9)}, where 9 
corresponds to r p at the median redshift of the survey. 
Such a weighting scheme incorporates the angular infor- 
mation of the missing galaxies and retrieves the correct 
angular counts. Howe ver, w^irr , ) and w „(9) are not com- 
pletely correspondent. ILi et al.l (|2006bf ) test this method 
with mock catalogs of SDSS galaxies. The method works 
reasonably well on large scales, but on scales of 0.05 — 
1 ft _1 Mpc the corrected w p (r p ) shows a clear deficit with 
respect to the true w p (r p ). 

Other methods aimed at overcoming the fiber colli- 
sion problem to measure the small-scale galaxy clus- 
tering broadly fall into the above two categories or 
some combination. For example, the small-scale 
clustering can be inferred from the full photomet- 
ric samp le by cross-corre l ating w i th the spectroscopic 
sample (lEisenstein et~all l2005at iMasiedi et al.l 120061: 
Watson et al.l 120101 1201 I Jiang et al.l 1201 It IWang et al.l 



20111 ). Under the assumption of isotropic clustering, 



the photometric objects near a spectroscopic target can 
be considered to be at the same redshift as the tar- 
get, similar to the nearest neighbor method, and the 
small-scale clustering can be measured. Contamination 
from interlopers reduces the signal-to-noise and needs to 
be removed statistically. As with the nearest neighbor 
method, this cross-correlation technique does not work 
in measuring the full three-dimensional (3D) clustering 
in redshift space. 

Accurate measurements of galaxy clustering are im- 
portant in many applications. In particular, the small- 
scale clustering can be used to probe the spatial dis- 
tribut ion of galaxies ins i de the host dark matter halos 
(e.g, iWatson et al.ll2010l 1201 ID . to infer the kinematics 
of galaxies in halos, and to reveal environmental effects 
on galaxy formation and evolution. In this paper we 
propose a new and efficient correction method for the 
fiber collision effect based on the spectroscopic galaxy 
sample. The method is proven to work well in the case 
of measuring both the projected galaxy 2PCFs and the 
full 3D rcdshift-space 2PCFs. After describing the mock 
catalogs used for testing the method and our measure- 
ments in Section 2, we present our correction method 
and explain its theoretical basis in Section [3] Tests of 



the method with mock catalogs and comparisons with 
other methods are presented in Section [4] We discuss 
possible systematic effects of the method in Section [5] 
and conclude in Section [6l 

2. MOCK CATALOGS AND CLUSTERING 
MEASUREMENTS 

Throughout the paper, our proposed new method will 
be tested and compared to other methods using cluster- 
ing measurements performed on available realistic mock 
catalogs. We use the LasDamas galaxy mock catalog^ 
(C. McBride et al. 2012, in prep.), which are constructed 
by populating galaxies into dark matter halos identified 
in the LasDamas simulations. The -/V-body LasDamas 
simulations adopt a spatially flat ACDM cosmology, with 
a matter density parameter ft m — 0.25, baryon density of 
f2& = 0.04, erg = 0.8 (the primordial matter fluctuation 
amplitude on scales of 8/i _1 Mpc, linearly extrapolated 
to z = 0), a primordial matter fluctuation spectra index 
n s = 1, and a Hubble constant of ft, = 0.7. Dark matter 
halos are identified usin g a friends-of-friends algorithm 
(e.g., iDavis et "all [1985) with a linking length of 0.156 
times the mean particle separation. Dark matter halos 
are populated with galaxies throug h a halo occupation 



distri bution (HOD) approach (e.g.. IBerlind fc Weinberg! 
120021: ICooTav fc ShetnH200l IZheng et al.ll2005D and the 



HOD parameters are determined through modeling the 
clustering of the early BOSS z ~ 0.5 CMASS sample 
(I White et al.ll20Tll) . Redshift space distortions are also 
included in the mock catalogs by accounting for the pe- 
culiar velocities of galaxies. The radial and angular se- 
lection functions in the mock catalogs are constructed to 
be uniform. 

In total, we make use of 40 LasDamas mock galaxy cat- 
alogs. In addition to matching the clustering of CMASS 
galaxies, the mocks also r eproduce the geom etry of the 
early BOSS data. As in IWhite et all (|2011l ). the Las- 
Damas mock catalogs for CMASS sample are divided into 
three separate regio ns. For simplicity, w e only use the so- 
called "region B" in IWhite et al.l ()201lD . since it has the 
largest volume and number of galaxies. Each mock of this 
region consists of about 50, 000 galaxies, with a sky cov- 
erage of ~ 600 deg 2 and a redshift range of 0.4 < z < 0.6, 
corresponding to a volume of ~ 0.16/i~ 3 Gpc 3 . The mock 
catalogs do not have the specific tiling mask and fiber 
collisions imposed on them, and we do that ourselves for 
our tests as described below. 

In this paper, we focus our discussion on the 2PCFs 
and rel ated statistics. We u se the Landy-Szalay esti- 
mator (|Landv fc Szalavill993l ) to measure the 2PCFs of 
galaxies in the mock catalogs, 



DD - 2DR + RR 
RR ' 



(1) 



where DD, DR and RR are the data-data, data-random, 
and random-random pair counts measured from the data 
of N galaxies and random samples consisting of Nr ran- 
dom points. These pair counts are normalized by divid- 
ing by N(N - l)/2, NN R , and N R (N R - l)/2,' respec- 
tively. 

We measure the 3D £(r p ,n) and £(s,[i) functions and 
the redshift-space 2PCF £(s), where r p and n are the 

3 http:/ /lss. phy.vanderbilt.edu/lasdamas/ 
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separations of gal axy pairs perpend i cular and parallel t o 
the line of sight (jFisher et al.lll994t iZehavi et alll2002D . 
s 2 = r 2 + 7r 2 , and pL = tt/s is the cosine of the angle 
between s and it. The redshift-space 2PCF differs from 
the real-space one because of the redshift distortion effect 
induced by galaxy peculiar velocity. The redshift distor- 
tion can be mitigated by projecting the 2PCF along the 
line-of-sight direction, with the projected 2PCF w p (r p ) 
(jDavis fc Peebles 1983) defined and measured as 

poo 

w p (r p ) = 2 £(r p ,Tr)d7r = 2 VV(r p , t^Atf,, (2) 
Jo 

where 71", and AiTi are the ith bin of the line-of-sight 
separation a nd its correspond ing bin size. 

Following iHamiltonl ()1992[ ). the redshift-space 2PCF 
£(s, fi) can be written in the form of multipole expansion, 

i 

where Pi is the Z-th order Legendre polynomial. The 
multipole moments £/ is determined by 

2/ 4- 1 r 1 

Zl(s) = -^ r J Z(s,(j,)P l (n)dn. (4) 

In linear theory, only the moments of I = 0, 2, and 4 
are non-zero. The monopole £o(s)j quadrupole £2(5), 
and hexadecapole £4(0) are useful for the study of red- 
shift distortions and for obta ining constraint s on cosmo- 
logical parameters (see, e .g.. lHamiltonlll992t iCole et all 
19941: iTinker et al l 1200a iPadmanabhan fc White! 120081: 
Kazin et al.H2012t IReid fc Whitell201lD . We will also test 
how well our method recovers these moments. 

For measuring the 2PCFs, we use a binning scheme 
of Alogr p = 0.2, Avr = l/i _1 Mpc, A logs = 0.2, and 
A/i = 0.1. Our results are insensitive to these choices. 
To obtain the projected 2PCF w p (r p ), £(r p , n) is summed 
along the line-of-sight direction up to 7r max = 40 ft, _1 Mpc. 
Integrating to a larger line-of-sight separation or using 
realistic angular and redshift selection functions will not 
affect the conclusion on our correction method. We also 
note that while the mock catalogs we use were con- 
structed for studying the BOSS CMASS sample, our gen- 
eral conclusions regarding the validity of our method are 
not dependent on that and would hold for other mock 
and real data sets. 

3. THE NEW CORRECTION METHOD 
3.1. Sample Division 

In modern galaxy redshift surveys, a tiling algorithm is 
usually applied to design and place spectroscopic plates 
(tiles) to cover the survey area. These tiles partially over- 
lap over some of the observed region. In such overlap 
regions, a galaxy with no fiber assigned in one tile can 
have a fiber allocated in the other one. For example, in 
SDSS-I and II, about 40% of the survey area is covered 
by more than one tile, which eliminates most of the fiber 
collisions in those regions. 

The basic idea of our method of dealing with the fiber 
collision effect is simply to estimate the contribution of 
the fiber-collided galaxies to the clustering by using the 
information in the tile overlap regions. In order to make 



such an estimate, before measuring any clustering statis- 
tic, we divide the full galaxy target sample (i.e., the input 
photometric sample) into two distinct populations: 

Population 1: a subsample in which each galaxy is 
not angularly collided with any other galaxy in this sub- 
sample. We maximize the number of galaxies that are 
not collided with each other. Such a set of "decollided" 
galaxies provide a "clean" subsample with no fiber colli- 
sion correction to be considered. 

Population 2: a subsample including all the galaxies 
that are not in Population 1 . This is the set of potentially 
collided galaxies, and all the fiber-collided galaxies come 
from this subsample. Each galaxy in this population is 
within the fiber collision scale of a galaxy in Population 
1. 

The division of Population 1 and 2 follo ws the scheme 
of as signing fibers in SDSS observations (jBlanton et all 
2003). The specific tiling algorithm always recovers one 
galaxy from collided pairs, and two galaxies from the col- 
lided triples if the angular distance of these two galaxies 
are larger than the fiber collision scale. With our divi- 
sion, Population 1 galaxies always have fibers allocated 
and have spectroscopic redshifts measured. Some frac- 
tion of Population 2 galaxies can also have fibers allo- 
cated in the tile overlap regions. Also with our division, 
in each "collided" close pair of galaxies, one galaxy (with 
measured redshift) will always be part of Population 1 
and the other (with a measured redshift or not) will be 
part of Population 2. This definition aims to ensure that 
the pair counts involving the galaxies with redshifts in 
Population 2 (hereafter, we refer to them as "resolved" 
galaxies) can be regarded as a representative subset of 
the overall pair counts. The specific tiling and fiber as- 
signment constraints can make the situation non-trivial 
if the "representative" assumption is not satisfied in real 
observation, and we discuss the possible systematics in- 
troduced in such a case in Section [5] 

We assume that in the survey, we have N = Ni + N 2 
galaxy targets, where Ni and N 2 are the numbers of 
galaxies in Populations 1 and 2, respectively. Again, all 
galaxies in Population 1 have spectroscopic redshifts, and 
because of fiber collisions only a fraction of Population 
2 galaxies do (e.g., in tile overlap regions). We denote 
the set of targeted galaxies as D (with a total number 
N) and use D\ and D 2 to represent the sets of Popula- 
tion 1 and 2 galaxies in the sample (with numbers of Ni 
and N 2 , respectively). We use D' and D 2 to denote the 
corresponding galaxy sets that have spectroscopic red- 
shifts measured (with numbers N' and A^)- Note that 
by definition, D[ is the same as D±. 

The data-data pairs counts can then be decomposed 

as 

N DD = N DlDl + N DlD2 + N D2D2 , (5) 

and 

Nd'D' = N DlDl + N DlDi + N D ^ 2 . (6) 

The actual numbers of pairs in equation ([5]) are what are 
needed to estimate galaxy 2PCFs, while the pair numbers 
in equation ([6]) are what one obtains in the spectroscopic 
sample. Our method is to make use of the pair counts 
of Population 1 and 2 galaxies and those of Population 2 
galaxies in the tile overlap regions to recover the correct 
counts appearing in equation ([5]), and therefore to mea- 
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sure the 2PCFs properly. We note that the fraction of 
D' 2 in D 2 galaxies, N 2 /N 2 , is an important factor in the 
following discussions. As with the data-data pair counts, 
the data-random pair counts DR can also be decomposed 
in a similar way. 

3.2. The Simplified Case 

To illustrate our fiber-collision correction method, we 
first consider the simplest case where Population 2 galax- 
ies are randomly selected to be assigned fibers. In this 
case, D 2 is a random subset of D 2 , and the pair counts 
Nd 1 d i 2 an d ^d' 2 d' 2 are simply proportional to the full 
counts Nd 1 e> 2 and Nd 2 u 2 , i.e., 

N DlD2 = jpN DlJ y a , N D2D2 = ( j±) N D , D ,. (7) 

For simplicity, N 2 and N 2 are assumed to be large so that 
N 2 (N 2 -l) and N^N^-l) are replaced with iVf and N£, 
respectively. The above equations provide the way to 
correct the pair counts obtained from the spectroscopic 
sample. 

Similarly, the data-random pair counts can be cor- 
rected as 

N D2R = j^N D , R . (8) 

Since we can have a large number of random points, 
the correction here for N D2R is less noisy than those for 
N DlD2 and N D2 d 2 . 
The full pair counts are reconstructed as 

N DD =N DlDl + ^N DlD , + (^j N D , D , (9) 

N DB = N DlR + -^N D , 2R . (10) 

Figure [1] shows the data-data pair counts from the 
mock catalogs for the simplified case, decomposed ac- 
cording to equation ©. The fiber collisions are artifi- 
cially imposed on the mocks with an overall fraction of 
N^/N 2 = 0.42 as in the BOSS CMASS catalogs. The 
solid and dotted curves are the corrected pair counts, 
obtained from the spectroscopic samples, averaged over 
the 40 mocks, while the squares show the actual full pair 
counts. The shaded region in each panel denotes the 
la scatter from the 40 mock catalogs of the DD pair 
count (estimated from D'D' based on eq. [7]) . Comparing 
the solid curve (plus the small shaded region) with the 
squares, we see that our correction method accurately 
recovers the true pair counts over all scales probed, for 
both £(s) and £(r p ,7r). The increase in the scatter on 
small scales is caused by shot noise, since the numbers 
of DiD' 2 and D' 2 D 2 pairs are less than D\D 2 and D 2 D 2 
pairs. 

From Figure [1] we see that there are almost no D\Di 
pairs with separation smaller than ~ 0.3 /i _1 Mpc, which 
is the minimum fiber collision scale (corresponding to 
the lowest redshift z — 0.4). On small scales, the pair 
count is dominated by D\D 2 pairs, while D 2 D 2 pairs 
have a small but non-negligible contribution. Across the 
fiber collision scale (~ 0.3-0.5 h~ 1 Mpc), the dominant 
contribution to the total pair count shifts from D\D 2 
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Fig. 1. — The full pair counts Nod (solid lines) for £(s) and 
£(r p ,7r) in the simplified case. For the latter, we only show the 
example of tt = 2fc _1 Mpc. The solid and dotted curves are the 
corrected pair counts averaged over the 40 mocks, obtained from 
the spectroscopic galaxies, while the squares show the actual pair 
counts. The shaded areas are the lcr error distribution of the cor- 
rected NpD- The corrected pair counts are further decomposed as 
the contribution from the different populations, Njj 1 £) 1 , Afrj 1 D 2 > 
and No 2 o 2 , shown as the dotted lines, labeled here simply as 11, 
12, and 22. The vertical dashed lines denote the physical fiber col- 
lision scale corresponding to the fiber collision angular constraint, 
determined by the highest redshifts in the mocks. 

pairs to D\D\ pairs, reflecting the change from collided 
to decollided galaxies. 

3.3. Theoretical Basis 

Our correction method can be put in terms of a de- 
composition of the 2PCF. With a galaxy sample divided 
into subsamples, such as red and blue galaxies, central 
and satellite galaxies, or in our case Population 1 and 2 
galaxies, the 2PCF can be decomposed into contributions 
from the two-point aut o- and cross-cor relation functions 
of subsample galaxies (|Zu et al.l 120081) . This is a fully 
equivalent way of describing the pair decomposition. In 
the case of dividing the galaxy sample into Population 1 
and 2 subsamples, we have 

N 2 t = Wfoi + 2A^ 2 £i2 + TVf £ 22 (11) 

w Nffa + 2A^ 2 £i2< + N$&2>, (12) 

where £ can be either £(s), £(r p ,ir), or w p {r p ), and £n, 
£12, and £22 are the two-point auto-correlation function 
of Population 1 galaxies, cross-correlation function be- 
tween Population 1 and Population 2 galaxies, and auto- 
correlation function of Population 2 galaxies, respec- 
tively. The second line in the equation uses the compo- 
nent correlation functions £12' and £2*2' estimated from 
the D\ and D 2 galaxies in the spectroscopic sample to 
approximate £12 and £22, which is the key point of our 
correction method. 

If D' 2 galaxies are representative of D 2 galaxies, i.e., 
D 2 is a random subset of D 2 , the ensemble average of 
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£12' (6'2') should equal to that of £l 2 (£22): 

(£l2'> = (&2>, (6'2'> = (62}. 



(13) 



That is, with ensemble averages, the approximate sym- 
bol can be replaced with the equal sign. Therefore, our 
method using equation (|T2|) provides an unbiased correc- 
tion, which is a desired merit. In practice, the method is 
only applied to one realization in the ensemble, so differ- 
ences are expected between the true underlying 2PCF 
and the 2PCF corrected with our method. However, 
any discrepancies would only result from the fact that 
a smaller number of galaxies are used to estimate the 
component 2PCFs. They show up as sample variance, 
not systematic errors. Potential systematics caused by 
the violation of the "representative" assumption of D 2 
galaxies are discussed in Section [5] 

The component 2PCFs can be estimated with the 
Landy-Szalay estimator, 



61 = 

£l2< = 
£,2'2' = 



D 1 D 1 - 2£>ii?i + 
R\R\ 

D^'z - DiR 2 - D' 2 Ri + R1R2 
R1R2 

D' 2 D' 2 - 2D' 2 R 2 + R 2 R 2 
R2R2 



(14) 
(15) 
(16) 



In the case that D' 2 is a random subset of D 2 , we can use 
the same random sample, i.e., R\ = R2 = R in the above 
equations. It is not surprising that when substituting 
the above equations into equation (|12[) . we end up with 
exactly the same result as with equations © and (flU)) . 

3.4. The Tiled Case 

Taking into account the tiling in the real observational 
situation would change the geometry of the distribution 
of D 2 galaxies such that most of them would preferen- 
tially populate the tile-overlap regions. The survey ge- 
ometry can be described by the individual sectors, de- 
fined by areas of sk y covered by unique sets of tiles 
(Blanto n"et~aT][200l . We use -/V t ;ie to denote the num- 
ber of tiles covering each sector. For A^iic = 1 regions, 
essentially no fiber-collided galaxies {D 2 ) can have spec- 
troscopic redshifts. For 7V t iie > 2 regions, however, most 
of these will be resolved, with spectroscopic redshifts 
measured as a result of repeated observations in these 
tile overlap regions. In practice, some fiber-collided D 2 
galaxies in iVtiie = 1 regions may still have spectroscopic 
redshifts from other surveys. In the case of the CMASS 
sample, about 5.5% of all the galaxies do not have red- 
shifts. Specifically, the values of the ratio N 2 /N 2 are 
about 5%, 71% and 87% in iVtii e = 1,2,3 regions, re- 
spectively. The 5% recovered galaxies in iVtiie = 1 re- 
gions a re included from the " Legacy survey" of SDSS-I 
and II (jEisenstein et al.ll2011ft . 

In order to mimic the full observational case, we im- 
pose the tile placement of the BOSS survey on all mock 
catalogs as well as fiber collisions according to the tiling 
algorithm with the appropriate N 2 /N 2 ratios specified 
above. We stress that the validity of our method does 
not depend on the specific values of these parameters, 
and these are simply chosen here to resemble the val- 
ues in the BOSS survey. Figure [2] illustrates the galaxy 
distribution and the tile placement in a small section of 
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Fig. 2. — An example of the galaxy distribution in one of our 
mocks. Large circles are the placed plug-plate tiles. The open 
symbols are decollided Di galaxies. The blue squares denote the 
D' 2 galaxies (i.e., resolved D2 galaxies, with fibers assigned), while 
the red squares are those D2 galaxies without any fiber assigned. 
We also mark the different iV t ji c regions in the figure. 

one of our mocks. The open symbols are all D\ galaxies. 
Filled squares denote D 2 galaxies, where blue ones mark 
resolved galaxies (D' 2 ) and red ones are those galaxies 
in D2 whose redshifts are missing due to fiber collisions. 
The D 2 galaxies are not randomly distributed over the 
whole observed sky — they mostly occupy regions with 
■V.: 1, -2. 

For the tiled case, our correction method as in Sec- 
tion 13.31 remains the same, and the only modification is 
the need to account for the specific geometry of the D' 2 
galaxy distribution. One straightforward way to do this 
is to create separate random catalogs R\ and R2 for D\ 
and D 2 galaxies. The R\ catalog can be created as usual 
by incorporating the standard radial and angular selec- 
tion functions of the sample. We note that the latter is 
commonly characterize d and applied as a fu nction of the 
individual sectors (e.g., IZehavi et al.l 120021) . For the R2 
catalog, an additional angular completeness mask needs 
to be applied, with the number of random points in each 
individual sector modified by the corresponding N 2 /N 2 
ratio. For example, in Figure [2] the circular tile centered 
at RA ~ 158° and Dec ~ —2° is comprised of 8 indi- 
vidual sectors, as a result of tile overlap, each with its 
own N 2 /N 2 value. In the extreme case that there are no 
D 2 galaxies in iVtiie = 1 regions, these regions would be 
empty in the R2 catalog. 

Alternatively, one could choose to up- weight D' 2 galax- 
ies in each sector by N 2 /N 2 , instead of creating a sep- 
arate random sample for the D' 2 galaxies. However, 
for iVtiie = 1 regions, N 2 /N 2 is usually a tiny number, 
which will introduce large errors when adopting the up- 
weighting method. Therefore, we prefer to down-weight 
the random catalog to accurately account for the angular 
distribution and selection of D' 2 galaxies. 
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4. TESTING THE METHOD 

We test our correction method and compare with other 
commonly used methods by measuring the 2PCFs and 
related statistics in the LasDamas mocks. We first show 
results for the simplified case and then test our method 
on our tiled mocks. 

4.1. The Simplified Case 

To apply our correction method, we first divide the 
mock galaxies into the two populations as discussed in 
§ 13.11 We first define all galaxies in the sample as D\ 
and proceed as below: (1) For each collided pair in D\, 
we randomly change one galaxy to be D2; (2) For each 
collided triplet (and higher multiplet) in D\, we assign 
as D2 the galaxy that collides with the most T)\ galaxies; 
(3) We repeat steps (1) and (2) until none of the galaxies 
in D\ collides with one another. This method maximizes 
the number of galaxies in D\ and mimics the real tiling 
algorithm. In reality, some objects (e.g., quasars) have 
higher fiber allocation priority and some assigned fibers 
do not result in reliable redshifts due to the hardware 
limitation. This complicates the tiling algorithm, but 
the targeted galaxies can still be divided into the two 
populations properly. The population division results in 
about 9.5% of all galaxies being in the D 2 sample. We 
then randomly assign fibers to 42% of the D2 galaxies 
(i.e., N2/N2 = 0.42), so that about 5.5% of all galaxies 
do not have fibers assigned, mimicking the fiber colli- 
sion fraction in the CMASS sample. With the simplest 
case considered here, we only need to create one random 
catalog. 

Figure [3] shows the 2PCFs (w p and £(s)) from our cor- 
rection method and the comparisons with those from 
other methods. Filled squares represent the actual 
2PCFs, measured using all the mock galaxies. Error bars 
reflect the la variation among the 40 LasDamas mocks. 
Open squares are obtained using only those galaxies that 
have fibers assigned, corresponding to the case without 
any fiber-collision correction. In each panel, the vertical 
dotted line is the fiber-collision scale corresponding to the 
highest redshifts included in the mocks. Not accounting 
for fiber-collisions, the measured 2PCF drops sharply be- 
low the collision scale, and it is also underestimated on 
larger scales. Note that the error bars in the top panels 
represent the variance (fluctuation) among the 40 mocks, 
and they do not reflect directly the accuracy of our cor- 
rection method. In the bottom left panel, we compute 
the ratio w p /w Pt tme (and similarly for £(s) in the bottom 
right panel) for each mock, where w p and untrue &re the 
corrected 2PCF and the true one. The error bars are 
the la variation of the ratio among the 40 mocks, which 
reflects the mean accuracy of our correction method for 
one mock. The mean ratio curve corresponds to a vol- 
ume 40 times larger than one mock, showing clearly that 
our correction method is unbiased. 

In each panel, the blue solid curve is the 2PCF ob- 
tained with our correction method, averaged over the 
40 mocks. Error bars are the la scatter among the 
40 mocks. Our correction method works very well for 
both Wp and £(s) over all measured scales. In particu- 
lar, it works down to the smallest scales for which we 
have enough pair counts to estimate the 2PCF (r p ~ 
0.1 /i — 1 Mpc). The fractional errors for w p (r p ) on small 



scales are smaller than those for £(s), since for the same 
value and bin size of r p and s there are more pairs in 
computing w p (r p ) than £(s) due to the projection over 
large line-of-sight separation. On scales above 1 /i _1 Mpc, 
the errors in both w p and £(s) are similar, at the level of 
<3%. 

The full collision-free 2PCFs and the ones correspond- 
ing to our new correction method are further decom- 
posed, in the top panels, into the contributions from the 
component 2PCFs (dashed and dotted lines; see Eq. (jlip 
and (fl"2|)). Our correction recovers well all the compo- 
nents of the correlation function. Similar to Figure [TJ 
the contribution from £n dominates on large scales, and 
it quickly decreases below the fiber collision scale. The 
contribution from component £12 dominates on scales 
smaller than the collision scale, where that of £22 makes a 
non-negligible contribution. Without any fiber collision 
correction, the measured 2PCFs (open squares) are still 
a combination of the three component 2PCFs, but with 
different coefficients than those in equation (fl"2j). The 
significant decline at the fiber collision scale in the non- 
corrected w p (r p ) reflects the transition from £n to £12 
dominated regime. The low amplitude below the colli- 
sion scale suggests the lack of contribution from £12. 

For comparison, the nearest-neighbor method (red 
curves) yields the correct w p (r p ) above the fiber col- 
lision scale, due to the line-of-sight projection, but it 
clearly overestima t es w v (rr>) below this scale (see also 
iZehavi et af] 120021 120051 ) . because of the increasing im- 
portance of the line-of-sight separation on small scales. 
For £(s), the nearest-neighbor estimate fails to recover 
the correct values over most scales, deviating by more 
than la for all scales < 10/i Mpc. Thus its use is 
very limited for redshift-space clustering measurements. 
The 2PCFs (both w p and £(s)) from the angular correc- 
tion method systematically deviate from the true val- 
ues except for small scales, and on large scales they 
approach the estimate with no fiber-collision correction 
applied (see iWhite et al.l 120111) . The deviati on is at 
a level < 10%, consistent with the finding of iLi et al.1 
(|2006b[ ). Physical explanations for the deviations seen 
in the nearest-neighbor and angular corrections are pro- 
vided in Section [5] 

In contrast, our correction method appears unbiased, 
not showing any systematic errors. The measurement er- 
rors are larger than those with all galaxies (top panels in 
Figure[3]). This is easy to understand — our method only 
uses galaxies with fibers assigned (i.e., with spectroscopic 
redshifts) , which is less than the total number of galaxies 
and therefore the sample variance increases. A survey of 
larger volume would help to reduce the sample variance. 
The other two commonly used correction methods do in- 
troduce scale-dependent systematic errors, although the 
nearest-neighbor method does work remarkably well for 
w p measurements on large scales. 

4.2. The Tiled Case 

To test our method for the more realistic situation, 
we impose the BOSS tiling geometry on the mocks (see 
an illustration in Figure [2]) . The division of D\ and D2 
galaxies is still the same as in section B~T1 By definition, 
all Di galaxies have fibers assigned. We then randomly 
assign fibers to N^/N 2 = 5%, 71%, and 87% of D 2 galax- 
ies in iVtue = 1, 2, and 3 regions, respectively. These 



Fiber Collision Correction in Galaxy Clustering 



7 




Fig. 3. — Tests of different fiber-collision correction methods for the projected correlation function, w p (r p ), and the redshift-space 
correlation function, £(s), for the simplified case. Top panels show the results for r p X w p (r p ) (left) and s X £(s) (right) while the ratios 
of the estimates to the full measurements, without any missing fiber-collided galaxies, are shown on the bottom. Solid lines correspond to 
the different correction methods, filled black squares are the true full measurements, and open magenta squares are the results without any 
fiber-collision correction applied. Error bars reflect the la variation among the 40 LasDamas mock catalogs. (In top panels, for clarify, 
error bars are only shown for the true c ase and for our correction.) We plot as well the i ndiv idual components of the 2PCF decomposition 
in the true case (black dotted lines; Eq. llU and for our correction (blue dashed lines; Eg. |12t . The vertical dotted lines denote the physical 
fiber collision scale corresponding to the fiber collision angular constraint, determined by the highest redshifts in the mocks. 




Fig. 4. — The same as Figure [3] but now for the tiled mocks. 
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fractions are consistent with those in the CMASS sam- 
ple. As with the simplest case, there are about 5.5% of 
all galaxies left without fibers. 

As explained in section l3T4l we create separate random 
catalogs Ri and R2 for D\ and D 2 galaxies, respectively. 
The Ri catalog has the overall survey geometry of the 
mock samples. For the Ri catalog, in addition to the 
overall geometry, we impose a random sub-sampling of a 
N 2 /N 2 fraction of points in each individual sector. Note 
that when applying the method on actual data, it's best 
to use the individual N 2 /N 2 values in each sector for 
constructing R2, rather than the average value for each 
Atiic, to fully account for the distribution of D' 2 galaxies. 

The 2PCFs results from the different correction meth- 
ods on the tiled mocks are shown in Figure [4] They 
are quite similar to those seen in Figure |3] for the simpli- 
fied case. Our correction method still accurately recovers 
both Wp{r p ) and £(s) over all measured scales. The er- 
rors increase only slightly, compared to the previous case, 
reflecting the increased sample variance (as D' 2 galaxies 
now occupy mostly the smaller tile overlap regions). For 
w p (r p ), the errors are about 8% at r p ~ 0.1 h~ 1 Mpc and 
13% at r p ~ 30/i _1 Mpc for one mock (top left panel in 
Figure [4}. These errors include the intrinsic fluctuation 
among the mocks. Similar to Figure[3l the bottom panels 
of Figure H] plot the ratios of w p /w p ^r\ic and £/£true> re- 
flecting the expected accuracy of the correction methods. 
For nip, the correction errors for our method are about 
6% at r p ~ 0.1 fr^Mpc and ~ 2.5% at r p ~ 30fr _1 Mpc 
for the volume of one mock. The mean ratio in each 
panel again demonstrates that the method is unbiased. 
The success of our method also implies that applying the 
completeness N' 2 /N2 on the R2 random catalog is the 
proper way to account for the angular distribution of D' 2 
galaxies. The level of accuracy and associated systemat- 
ics for the nearest-neighbor and angular corrections also 
remain similar to those in the simplified case. 

We have tested as well on the mocks the recovery 
of the full 3D £(r p ,7r) correlation function and its mo- 
ments. The 2PCF £(s) in the right panels of Figure H] 
is the monopole of the 3D redshift space 2PCF. In Fig- 
ure [SJ we show the quadrupole £2(5) (left) and hexade- 
capole ^4(s) (right) from the different correction meth- 
ods. Again, our method provides unbiased estimates of 
these quantities. Note that the large error bars in the 
ratios near s ~ 0.1/i _1 Mpc and s ~ 5h~ 1 Mpc in the 
bottom left panel and below s ~ 0.4 /i _1 Mpc in the bot- 
tom right panel result from the fact that the correspond- 
ing multipole is near zero at these scales. The success 
of our method has important implications on studying 
redshift-space distortions in the non-linear regime (e.g . , 
iTinker et all [20061: iTinkerl 120071: IReid fc White! WUf\ . 
which has been hindered so far by the fiber collision 
effect. In contrast, the nearest neighbor method com- 
pletely fails to recover the small-scale redshift distortions, 
because it effectively reduces the line-of-sight separations 
of pairs, washing out the Fingers-of-God signal. Resid- 
ual systematics remain for it on larger scales as well. 
The angular correction method works reasonably well, 
but systematic deviations at a 10% level persist in the 
quadrupole and hexadecapole. 

5. SYSTEMATIC EFFECTS 



On key assumption in our method presented above is 
that the D 2 galaxies are a representative (or random) 
subset of D 2 galaxies, which ensures that pair counts in- 
volving D2 galaxies can be recovered from those involving 
D 2 galaxies in a simple way. 

In reality, however, the "representative" assumption 
may not be fully satisfied, given the constraints from 
tiling and fiber assignment algorithm. There are two 
types of potential systematic effects in our correction 
method, when applied to the real observational data. 
The first is related to the possible difference in the tar- 
get density between the overlap and non-overlap regions, 
due to specifics of the tiling algorithm. The second is re- 
lated to galaxy pairs in collided triplets (or higher-order 
collided groups). In what follows, we discuss these two 
effects and provide solutions. 

5.1. Density Effect 

If the tiling algorithm is optimized to assign the 
most fibers to the galaxy targets, it may preferentially 
place overlapping til es in higher number density regions 
(jBlanton etalll2003l : Tinker et al. 2011, private commu- 
nication). In such a case, D 2 galaxies come from regions 
of slightly higher number density, not necessarily repre- 
sentative of the overall D 2 galaxies. This may limit the 
accuracy of our use of (£12') = (£12) and (£2-2' ) = (£22)- 
Observationally, the impact of such potential density 
variations on our method can be evaluated, if we have 
a complete representative observed area, where all fiber- 
collided galaxies have been resolved by repeat observa- 
tions. Theoretically, a set of mocks with the actual tiling 
algorithm used in observations applied directly to it, can 
help understand the impact. Since neither the required 
observed area nor the realistic tiled mocks are yet avail- 
able, we perform various tests with our mocks to estimate 
the impact of the density effect. 

We define a galaxy density measure of the overlap re- 
gions ("overlap density") as S = n ovcr i ap /n a n ~ 1, where 
^overlap and n a ii are the number densities of galaxies in 
the overlap regions and the whole survey regions, respec- 
tively. Then the question becomes whether the accuracy 
of our recovered 2PCFs depends on the value of 8. While 
we do not expect systematic density variations in the 
overlap regions in the ensemble of our tiled mocks, the 
variation caused by sample variance in individual mocks 
can be used to study the density effect. 

We have calculated the "overlap density" for each of 
our 40 mock catalogs using the number densities of both 
all the galaxies and just the D2 ones, which we term as 
Sail and 8d 2 , respectively. Figure [5] shows the depen- 
dence of Wp/wp^tme on the two overlap density measures, 
at different scales r p . In each panel, the open circles are 
the measurements from the 40 LasDamas mocks and the 
solid line is the linear least square fit to the data points. 
On large scales (r p > 1 /i _1 Mpc), we find no dependence 
of the 2PCF on the overlap density. On small scales 
(r p < 1 h~ 1 Mpc), the scatter is large, but the 2PCF ra- 
tio is again consistent with no dependence on the overlap 
density. Overall, no strong systematic dependence on ei- 
ther S a \\ or Sd 2 is found over the overlap density range 
probed by our mocks (—0.06 to 0.06). To put this into 
the context of real observation, we calculate the overlap 
densities in the CMASS DR9 sample and find them to 
be <5 a ii = 0.032 and Sd 2 — 0.085. Based on our tests, we 
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Fig. 5. — The quadrupole §2_(.s) (left panel) and the hexadecapole §4(s) (right panel) from different correction methods, 
symbols are similar to Figure [4] 
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Fig. 6. — Dependence of the ratio w p /w p ,true on the over-density 
of all galaxies <5 a n and that of D2 galaxies 5rj, in tile overlap re- 
gions at various scales r p . In each panel, the open circles are the 
measurements from the 40 LasDamas mocks and the solid line is 
the linear least square fit to the data. 



expect that the density effect in such a sample is not a 
concern. We have also verified that the angular 2PCFs 
of the overlap and non-overlap regions show consistent 
clustering amplitudes in the CMASS DR9 sample, con- 
firming that the density effect can be likely ignored. 



The density effect is not significant in our mocks, but 
there may exist another density-caused subtle effect in 
observational data. If D 2 galaxies in the overlap regions 
and the fiber-collided D 2 galaxies are not exactly the 
same type of galaxies given the environment difference, 
they might have a systematic difference in clustering. We 
use the CMASS DR9 sample to test this and find that the 
D 2 and D 2 galaxies do have the same color and apparent 
magnitude distribution, not unexpected given that the 
tiling algorithm itself is just an angular selection and 
does not involve any physical properties of the targeting 
galaxies. 

As a whole, our investigation suggests that the density 
difference between the tile overlap regions and the whole 
survey region does not introduce a noticeable systematic 
effect in our method. 

5.2. Effect of Collision Groups 

There is another effect that can violate the "representa- 
tive" assumption. Galaxies in the D 2 population can be 
associated with different collision groups. For D 2 galax- 
ies that are part of collision pairs only (N.B. all such 
pairs are D\-D 2 or 1-2 pairs given that the number of 
D\ galaxies is maximized in our division), D 2 galaxies 
in overlap regions are certainly a random subset of such 
D 2 galaxies. There are also D 2 galaxies that are part 
of collision triplets where the other two galaxies of each 
triplet are part of D\, For such D 2 galaxies in D\-D 2 -D\ 
(1-2-1) triplets, D' 2 galaxies in overlap regions are also a 
random subset of such D 2 galaxies. So the count of pairs 
including such D 2 galaxies can be well reproduced with 
D' 2 galaxies in overlap regions, for the above two cases. 
However, for D 2 galaxies in collision groups, where more 
than two such galaxies of each group collide with each 
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other, these D 2 D 2 galaxy pairs would not be appropri- 
ately recovered in most overlap regions. Since galaxy 
triplets are expected to dominate over higher multiplets, 
we focus our discussion here on them. 

Again, given that we maximize the number of Di 
galaxies, the collision triplets are either 1-2-1 type or 
1-2-2 type. The latter kind can not be fully recovered 
in iVtiic — 2 regions (only in iVtile > 2 regions) and is 
thus not appropriately represented when measuring the 
clustering. Moreover, these D 2 galaxies might be clus- 
tered differently than those in the previous two cases (1-2 
pairs and 1-2-1 triplets). In our mocks, the fraction of D 2 
galaxies in 1-2 pairs, 1-2-1 triplets, and 1-2-2 triplets are 
70%, 6% and 22%, respectively. The collided galaxies in 
such 1-2-2 triplets thus make up only about 1.2% (22% 
of 5.5%) of the total number of galaxies. This fraction 
is even smaller, about 0.7%, in the CMASS DR9 sam- 
ple. However, they can still have an adverse effect on the 
clustering measurements. 

In order to assess this, we modify the way of assign- 
ing fibers to the galaxies in our mocks to mimic the real 
tiling algorithm. In Section l4~2l the recovered D2 galax- 
ies in each sector are randomly selected according to the 
prescribed fraction. We now assign fibers to D2 galaxies 
in each colliding group according to the number of tiles 
covering that sector. For example, for three galaxies col- 
liding together in N t u c = 2 regions, at most two of them 
would be assigned fibers, one from D\ and the other from 
D' 2 - 

Figure [7] shows the 2PCFs and the decomposition com- 
ponents for our method without any correction for such 
a "triplet effect" (green lines) compared with the true 
ones(open symbols and black dotted lines). It clearly 
shows that, in this case, due to the missing D2D2 pairs 
the £22 components on scales less than the fiber colli- 
sion scale are significantly decreased. The £22 term has 
around a 15% contribution to w p on small scales, so with- 
out any correction for this the resulting w p is now sig- 
nificantly underestimated below the fiber collision scale, 
and £(s) is also affected. 

A natural solution to correct for this systematic ef- 
fect is to extend Equation (|11|) to three populations — 
D±, D2 and D 3 , where D2 here refers to the population 
of galaxies that collide with D\ but not collide with each 
other, and D 3 is the rest of the galaxies, corresponding 
to higher-rank collision groups. Equations (fTTj) and (|12|) 
are then revised as, 
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We refer to this correction method as "1-2-3 fix" below. 
The limitation of this correction is the small number of 
resolved D' 3 galaxies, since one needs regions covered by 
three or more tiles to recover D 3 galaxies, which only 
occupy a small fraction of the full survey region. There- 
fore such a correction method would have large sample 
variance. 

Another way of correcting for the collision triplets is 
to still use Equation (TT2|) but simply add to -D 2 ^2 the 
estimated missing pair counts in the tile overlap regions. 



The missing 2-2 pairs in 1-2-2 triplets are statistically 
equivalent to 1-2 pairs in such triplets. We therefore 
account for the unrepresented D 2 D 2 close pairs using the 
recovered D\D' 2 pairs in these triplets. We denote the D2 
galaxies (the definition of D 2 population here still follows 
Equation [TT|) in 1-2-2 collision triplets as D t and the 
recovered D2 galaxies in such triplets as D' t . The total 
number of D2-D2 close pairs in each (r p , n) bin below the 
fiber collision scale in overlap regions can then be written 
as follows: 
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where N t and N' t are the numbers of D t and D' t galaxies 
in the tile overlap regions, and N 2 l and Nlf 1 are the total 
and recovered number of D 2 galaxies in overlap regions, 
respectively. The first term in Equation (1191) represents 
the resolved -D2-D2 pairs in the sample (typically only in 
Atiic > 2 regions), while the second term is an estimate of 
the missing close pairs (mostly in N t i\ c = 2 regions). The 
expression (N t /N{. )N DlD > t is the total number of 1-2 pairs 
in 1-2-2 collision triplets, and dividing it by two gives the 
expected number of 2-2 pairs in such triplets (since a 1- 
2-2 triplet has two 1-2 sides and one 2-2 side). Note that 
for most triplets in overlap regions, N t /N' t = 2, so this 
simply adds a missing D' 2 D' 2 pair for each D\D' t pair. 
The correction term itself is then the second term in the 
square brackets in Equation (|20[) , adding the normalized 
missing pairs in 1-2-2 triplets below the fiber collision 
scale, and we refer to it as the "simple fix" . 

Figure [7] also shows the results of applying these two 
methods to the mocks with the modified fiber assign- 
ment. Both corrections alleviate most of the system- 
atics. The "1-2-3" fix is in better agreement with the 
underlying measurements (in particular for £(s)), but 
as discussed above, it has large sample variance, lead- 
ing to a ~50% increase in the error bars. The simple 
fix reduces the bulk of the systematics, but a few per- 
cent deficit remain on small scales. This could be caused 
by the complicated structure of the high-order colliding 
groups, remaining systematics associated with the D±D 2 
pair counts and sample variance. 

Although the two correction methods proposed here 
are not as accurate as in the simple tiled case, they still 
provide workable estimates of the true 2PCFs. We re- 
gard the simple fix as the more practical one, due to its 
simple application and the increased sample variance of 
the "1-2-3" fix. We thus advocate simply incorporat- 
ing the additional term (Equation |20| into the overall 
method. We have explored many alternative corrections 
to the issue of collision groups, with varying complexity 
and success, and will study it farther when a more real- 
istic set of tiled mocks or a complete fiber-collision free 
subsamplc in the BOSS ancillary program becomes avail- 
able. We note that, as mentioned above, the LasDamas 
mocks have in fact ~ 60% more close triplets than the 
real CMASS sample. Thus the magnitude of the effect 
presented here and any residual systematics are conser- 
vative and are likely smaller in the real data. 
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Fig. 7. — Effect of collision triplets on w p (r p ) and £(s), and comparison between the two correction methods. The unresolved D2 galaxies 
in collision triplets, if not corrected, lead to an underestimate of the 2PCF on small scales (green). The "1-2-3" fix (red) and the simple fix 
(blue) methods can help to reduce the effect, and the latter has smaller sample variance. In the top panels, individual components of the 
2PCF decomposition are plotted for the true case (black dotted lines) and for our corrections (blue and green dashed lines). 




Fig. 8. — Effect of collision triplets on the nearest neighbor and angular correction methods. The lines are the same as in Figure [4] but 
we adopt the simple fix as our correction. 
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We also test the influence of the triplet effect in the 
mocks on the nearest neighbor and angular correction 
methods, as shown in Figure [8] The nearest neighbor 
method is mostly unaffected by this, still recovering the 
correct correlation function on large scales. The angular 
correction is impacted by the triplet effect, especially for 
the redshift-space correlation function, though to a lesser 
extent than our method. Based on our tests, the overall 
accuracy of our method appears to still be better than 
the angular correction and nearest neighbor methods. 

6. SUMMARY AND DISCUSSION 

In this paper, we present a novel method for correcting 
the effects of fiber collisions in galaxy clustering statis- 
tics, utilizing resolved fiber collisions in tile overlap re- 
gions. The key element is dividing the target galaxy sam- 
ple into two distinct populations influenced differently by 
the fiber collisions and combine their contributions ac- 
cording to equation (|T2"j) . The collided galaxies, making 
up our so-called Population 2 subsample, are partially 
resolved, mainly in the tile overlap regions. These re- 
solved collided galaxies allow us to recover and measure 
the clustering statistics on all scales. The distinct spatial 
distribution of the galaxies in Population 2 which were 
assigned fibers can be properly accounted for by their 
angular completeness iV^/A^ in each individual tile sec- 
tor. We explain the theoretical basis for our method and 
extensively test it on realistic mock catalogs. We demon- 
strate that both the projected and the 3D redshift-space 
2PCFs can all be very well recovered, assuming that the 
recovered Population 2 galaxies are representative. With 
that assumption, the correction method is accurate and 
unbiased and limited only by sample variance. 

In the real observation, the density variations between 
overlap and non-overlap regions and the non-random se- 
lection of the Population 2 galaxies in higher colliding 
groups are the main potential systematic effects. While 
the density effect can be largely ignored as we demon- 
strate in the mocks, the influence of higher colliding 
groups should be carefully taken into account in order 
to provide a reasonable estimate of the true 2PCFs. We 
proposed a simple fix to the missing close pairs caused 
by such colliding groups. The remaining systematic ef- 
fect on small scale 2PCFs from such a method is expected 
to be lower than ~ 5%. The exact values of the system- 
atic errors are hard to obtain, given that we are limited 
by the sample variance on small scales. By shifting the 
plate placement of the tiling geometry, we find that the 
deviation from the true 2PCF based on our correction 
method is likely to be below 3%. To be conservative, we 
quote 5% as an upper limit of the systematic errors for 
our method. 

We also contrast our method to the commonly used 
nearest-neighbor and angular corrections. While these 
approximations work well for specific statistics on some 
scales, they are generally not accurate enough to give 
an unbiased estimate over all scales. For the nearest- 
neighbor correction, assigning the collided galaxy the 
redshift of its neighbor mainly influences the line-of-sight 
separation ir, while r p is only changed because of the non- 
plane-parallel effect. For projected statistics like w p (r p ), 
on scales larger than the fiber-collision scale, the pair 
counts are dominated by non-collided galaxies, and the 
nearest neighbor correction works extremely well. On 



small scales, however, the pair counts are dominated by 
collided galaxies. The nearest neighbor redshift assign- 
ment leads to an overestimate of the number of pairs 
within the projected separation 7r max and thus an over- 
estimate of w p (r p ), which causes the nearest neighbor 
correction to fail. For £(s), the nearest-neighbor correc- 
tion fails below and above the fiber-collision scale, since 
having the correct line-of-sight separation is more cru- 
cial, and the method does worse on all scales smaller 
than 15ft. _1 Mpc. The nearest-neighbor correction, how- 
ever, is still a good estimate on very large scales. 

The angular correction method provides a better es- 
timate on small scales than the nearest-neighbor one, 
but it is still only an approximation. By definition, the 
angular 2PCF w(0) is obtained via projecting over the 
whole line-of-sight depth of the survey, while the pro- 
jected 2PCF w p (r p ) is obtained from projection within 
line-of-sight separation of 7r max . Therefore, w(9) and 
w p (r p ) are not expected to be exactly the same. On small 
scales, the effect of the difference in projection depth is 
negligible and the angular correction method works rea- 
sonably well, while on large scales it is no longer the case. 
In addition, for a survey with large line-of-sight depth, 
the mapping from 6 onto r p is not unique because of the 
non-plane-parallel effect, further complicating the cor- 
respondence between w(9) and w p (r p ) and making the 
angular correction method less accurate. We find that 
our correction is generally superior, better theoretically 
motivated and more broadly applicable, especially in its 
power to recover the 3D correlation functions. Since the 
angular correction relies on the measured redshifts of the 
galaxies in the catalog, the systematic effects of the den- 
sity variation and higher colliding group also affect the 
accuracy of the angular correction on small scales. 

Our correction method can be directly applied to flux- 
limited survey samples. When dealing with a subsample 
of galaxies in certain redshift, luminosity or color bins, 
e.g., volume-limited samples, the determination of D\ 
and Z?2 populations is not as obvious, since we do not 
have redshifts for the missing galaxies. But as the two 
populations are only separated based on their angular 
distribution, which is independent of redshift, luminosity 
or color, the determination of D\ and D2 populations 
should be made in the parent full sample, where all the 
galaxies satisfy the same selection criteria. The N2/N2 
fractions in each sector are then also determined in the 
parent sample. The total number of A^ galaxies in each 
subsample is calculated by summing up all the expected 
£>2 galaxies in each sector, N2. SC c = AV.sccA^/A^. Based 
on a test of subsamples of galaxies in different color bins 
in CMASS DR9, we find that the color distribution of D' 2 
galaxies determined in this way is the same as that of the 
D2 galaxies, supporting our assertion that the division 
into the D\ and D2 populations is independent of the 
physical properties of the galaxies. 

In developing the new method, we have considered 
many other alternative corrections, such as using photo- 
metric redshifts, imposing fiber-collisions on the random 
catalog, applying different weights on the galaxies. We 
have also considered different variants in our definition 
of Populations 1 and 2. However, none of the many vari- 
ants we have tried succeeded in robustly recovering the 
underlying clustering. Our new method, rooted in solid 
theoretical ground, proves to work successfully. 
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For the early CMASS-like mock catalogs used in this 
work (~ 50, 000 galaxies in a volume of ~ 0.16/i _3 Gpc 3 ) 
with imposed tiling, our correction method reaches a 
statistical accuracy of ~ 6% at r p ~ 0.1/i _1 Mpc and 
~ 2.5% at r p ~ 30/i _1 Mpc for w p (r p ), and system- 
atic errors of < 5% on small scales. The statistical er- 
rors are essentially caused by the sample variance, and 
thus will be reduced for larger volumes, scaling down by 
roughly the square root of the volume. We have veri- 
fied that this scaling law holds when using subsamples 
of smaller volume, and the fluctuation around unity of 
the ratio of 2PCFs averaged over the 40 mocks in Fig- 
ure S] also shows that it scales down accordingly. The 
current SDSS-III BOSS DR9 sample already covers an 
area of about 3500 deg 2 , roughly 6 times larger than our 
mocks, so our statistical correction error will be 2.4 times 
smaller when applied to this sample. The final survey 
will cover about 10000 deg 2 , and the correction error in 
w p from sample variance is expected to be ~ 1.5% at 
r p ~ 0.1/i- x Mpc and 0.6% at r p ~ ZOh^Mpc. The 
residual systematics may also scale down somewhat with 
the decreased sample variance, and we plan to develop a 
better treatment of them in future work. 

The method thus enables more accurate measurements 
of galaxy clustering statistics on small and intermediate 
scales. In particular, it will enable us to reliably extend 
the measurements to smaller scales than obtained before 
and to recover the full 3D redshift-space correlation func- 
tions on small scales. This will allow a better probe of 
the distribution of galaxies within halos. HOD modeling 
of these new measurements will provide new constraints 
on the spatial and velocity distributions of galaxies in- 
side halos. Measurement and modeling of redshift-space 
distortions in both the linear and non-linear regimes 
will also improve constraints on cosmological parame- 
ters. These applications will be explored in future work. 
We note, however, that as our method is associated with 
increased sample variance, due to the limited sky cover- 
age of D 2 galaxies, it might not be ideal for very large 
scale clustering measurements, such as those performed 



for measuring the baryo n acoustic oscillation signature 
(|Eisenstein et al.l l2005b). In such cases, other methods 
such as the nearest neighbor correction can be consid- 
ered. 

The new method will allow for reliable galaxy cluster- 
ing measurements in current and future surveys. While 
we were motivated by upcoming measurements in the 
SDSS-III BOSS survey, and tested the method on corre- 
sponding mock catalogs, the method can be applied to 
any fiber-fed large surveys such a s the SDSS-I and II , 
2dFGRS, and planned BigBOSS (|Schlegel et al.ll20Tl . 
We have focused here on two-point auto-correlation func- 
tions, but our methodology is more broadly applicable 
to other related statistics. It can be easily extended 
to cross-correlation functions (see the Appendix) , which 
would not suffer from the systematic effect of higher- 
order colliding groups if the two samples of targets 
come from different surveys. The correction method can 
be further generalized to high-ord er statistics, e.g., the 
three-point corre l ation function (iJing fc Bornerl 120041 : 
IKavo et all l200i IMcBride et all |201lj H. Guo et al. 
2012, in prep.), with the effect of collision groups treated 
more accurately. The method can certainly be helpful in 
accurately measuring statistics in Fourier space as well, 
such as the power spectrum and bispectrum of galaxies. 
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Jeremy Tinker, David Wake and David Weinberg for 
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APPENDIX 

Our new method for fiber-collision correction is also valuable for the measurement of cross-correlation functions. For 
cross-correlation functions, the key equation (|12p is revised accordingly. When cross-correlating two samples a and &, 
both are divided into Populations 1 and 2. The decomposition equation becomes 

N a N b £ = N ai N bl £ aibl + N ai N b2 £ aib2 + N a2 N bl ( a . 2bl + N a2 N b2 £ a2b2 (1) 
« N ai N b ^ aibl + N ai N b2 £ aib , 2 + Na.Nb^br + N a2 N b2 Ca 2 b' 2 , (2) 

where N ai (N bl ) and N a2 (N b2 ) are the numbers of galaxies in Populations 1 and 2 in sample a(6), and a' 2 and b' 2 denote 
the subsets of a 2 and b 2 for which redshifts were obtained, analogous to D 2 and D 2 . 
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